Project overview:¶

The successful completion of this project aims to help the social media conglomerate make data-driven decisions to enhance its content strategy, prepare for an IPO, and improve its big data practices, ultimately leading to greater success and growth in the highly competitive social media industry.

The first part of the project involves data analysis to understand the content categories that are most popular on the social media platform. This analysis will involve processing large volumes of data to identify patterns, trends, and user preferences. The goal is to determine the top 5 content categories that have the highest aggregate popularity based on metrics such as likes, shares, comments, and engagement.

The identification of the top 5 content categories with the highest aggregate popularity on the social media platform. Insights into user preferences, engagement patterns, and content trends, which can inform the company's content strategy. Recommendations on how to optimize content creation and promotion to increase user engagement and retention.

The raw data used in this project is available on GitHub

In [1]:
#all the libraries to be used will be imported first:

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
import datetime
In [2]:
#the next step is to examine the content table for details regarding the contents that were uploaded:

df1 = pd.read_csv('Content.csv')
df1
Out[2]:
Unnamed: 0 Content ID User ID Type Category URL
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 8d3cd87d-8a31-4935-9a4f-b319bfe05f31 photo Studying https://socialbuzz.cdn.com/content/storage/975...
1 1 9f737e0a-3cdd-4d29-9d24-753f4e3be810 beb1f34e-7870-46d6-9fc7-2e12eb83ce43 photo healthy eating https://socialbuzz.cdn.com/content/storage/9f7...
2 2 230c4e4d-70c3-461d-b42c-ec09396efb3f a5c65404-5894-4b87-82f2-d787cbee86b4 photo healthy eating https://socialbuzz.cdn.com/content/storage/230...
3 3 356fff80-da4d-4785-9f43-bc1261031dc6 9fb4ce88-fac1-406c-8544-1a899cee7aaf photo technology https://socialbuzz.cdn.com/content/storage/356...
4 4 01ab84dd-6364-4236-abbb-3f237db77180 e206e31b-5f85-4964-b6ea-d7ee5324def1 video food https://socialbuzz.cdn.com/content/storage/01a...
... ... ... ... ... ... ...
995 995 b4cef9ef-627b-41d7-a051-5961b0204ebb 5b62e10e-3c19-4d28-a57c-e9bdc3d6758d video public speaking NaN
996 996 7a79f4e4-3b7d-44dc-bdef-bc990740252c 4fe420fa-a193-4408-bd5d-62a020233609 GIF technology https://socialbuzz.cdn.com/content/storage/7a7...
997 997 435007a5-6261-4d8b-b0a4-55fdc189754b 35d6a1f3-e358-4d4b-8074-05f3b7f35c2a audio veganism https://socialbuzz.cdn.com/content/storage/435...
998 998 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 b9bcd994-f000-4f6b-87fc-caae08acfaa1 GIF culture https://socialbuzz.cdn.com/content/storage/4e4...
999 999 75d6b589-7fae-4a6d-b0d0-752845150e56 b8c653b5-0118-4d7e-9bde-07c2de90f0ff audio technology https://socialbuzz.cdn.com/content/storage/75d...

1000 rows × 6 columns

In [3]:
#because we do not need the 'URL' and 'User ID' details, we are going to drop them:

df1d = df1.drop(columns = ['URL','User ID'])
In [4]:
df1d.head(5)
Out[4]:
Unnamed: 0 Content ID Type Category
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo Studying
1 1 9f737e0a-3cdd-4d29-9d24-753f4e3be810 photo healthy eating
2 2 230c4e4d-70c3-461d-b42c-ec09396efb3f photo healthy eating
3 3 356fff80-da4d-4785-9f43-bc1261031dc6 photo technology
4 4 01ab84dd-6364-4236-abbb-3f237db77180 video food
In [5]:
print(df1d.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  1000 non-null   int64 
 1   Content ID  1000 non-null   object
 2   Type        1000 non-null   object
 3   Category    1000 non-null   object
dtypes: int64(1), object(3)
memory usage: 31.4+ KB
None
In [6]:
#we need to grab a unique category detail to know where each content belongs to:

df1d['Category'].unique()
Out[6]:
array(['Studying', 'healthy eating', 'technology', 'food', 'cooking',
       'dogs', 'soccer', 'public speaking', 'science', 'tennis', 'travel',
       'fitness', 'education', 'studying', 'veganism', 'Animals',
       'animals', 'culture', '"culture"', 'Fitness', '"studying"',
       'Veganism', '"animals"', 'Travel', '"soccer"', 'Education',
       '"dogs"', 'Technology', 'Soccer', '"tennis"', 'Culture', '"food"',
       'Food', '"technology"', 'Healthy Eating', '"cooking"', 'Science',
       '"public speaking"', '"veganism"', 'Public Speaking', '"science"'],
      dtype=object)

Data cleaning of Content table¶

In [7]:
#the result of the unique category needs to be cleaned. The letter case are mixed and the quotation marks needs to be removed 
#we are going to unify the letter case by using the .str.lower() command:

df1d['Category'] = df1d['Category'].str.lower()
In [8]:
df1d['Category'].unique()
Out[8]:
array(['studying', 'healthy eating', 'technology', 'food', 'cooking',
       'dogs', 'soccer', 'public speaking', 'science', 'tennis', 'travel',
       'fitness', 'education', 'veganism', 'animals', 'culture',
       '"culture"', '"studying"', '"animals"', '"soccer"', '"dogs"',
       '"tennis"', '"food"', '"technology"', '"cooking"',
       '"public speaking"', '"veganism"', '"science"'], dtype=object)
In [9]:
#the next action is remove the quotation marks using .replace() command:

df1d['Category'].replace('"culture"','culture',inplace=True)
df1d['Category'].replace('"studying"','studying',inplace=True)
df1d['Category'].replace('"animals"','animals',inplace=True)
df1d['Category'].replace('"soccer"','soccer',inplace=True)
df1d['Category'].replace('"dogs"','dogs',inplace=True)


df1d['Category'].replace('"tennis"','tennis',inplace=True)
df1d['Category'].replace('"food"','food',inplace=True)
df1d['Category'].replace('"technology"','technology',inplace=True)
df1d['Category'].replace('"cooking"','cooking',inplace=True)


df1d['Category'].replace('"public speaking"','public speaking',inplace=True)
df1d['Category'].replace('"veganism"','veganism',inplace=True)
df1d['Category'].replace('"science"','science',inplace=True)
In [10]:
df1d['Category'].unique()
Out[10]:
array(['studying', 'healthy eating', 'technology', 'food', 'cooking',
       'dogs', 'soccer', 'public speaking', 'science', 'tennis', 'travel',
       'fitness', 'education', 'veganism', 'animals', 'culture'],
      dtype=object)
In [11]:
#now that Category column has been clean, we are going to check for null values:

df1d.isnull().sum()
Out[11]:
Unnamed: 0    0
Content ID    0
Type          0
Category      0
dtype: int64
In [12]:
#we are going to change the column name from Type to ContentType for readability:

df1d.rename(columns = {'Type':'ContentType'}, inplace = True)
In [13]:
df1d.head(5)
Out[13]:
Unnamed: 0 Content ID ContentType Category
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying
1 1 9f737e0a-3cdd-4d29-9d24-753f4e3be810 photo healthy eating
2 2 230c4e4d-70c3-461d-b42c-ec09396efb3f photo healthy eating
3 3 356fff80-da4d-4785-9f43-bc1261031dc6 photo technology
4 4 01ab84dd-6364-4236-abbb-3f237db77180 video food
In [ ]:
 

Reaction table¶

In [14]:
#the next table we are going to work with in this project is the reaction table which gives details of reactions to each content posted:

df2 = pd.read_csv('Reactions.csv')
df2
Out[14]:
Unnamed: 0 Content ID User ID Type Datetime
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 NaN NaN 2021-04-22 15:17:15
1 1 97522e57-d9ab-4bd6-97bf-c24d952602d2 5d454588-283d-459d-915d-c48a2cb4c27f disgust 2020-11-07 09:43:50
2 2 97522e57-d9ab-4bd6-97bf-c24d952602d2 92b87fa5-f271-43e0-af66-84fac21052e6 dislike 2021-06-17 12:22:51
3 3 97522e57-d9ab-4bd6-97bf-c24d952602d2 163daa38-8b77-48c9-9af6-37a6c1447ac2 scared 2021-04-18 05:13:58
4 4 97522e57-d9ab-4bd6-97bf-c24d952602d2 34e8add9-0206-47fd-a501-037b994650a2 disgust 2021-01-06 19:13:01
... ... ... ... ... ...
25548 25548 75d6b589-7fae-4a6d-b0d0-752845150e56 80c9ce48-46f9-4f5e-b3ca-3b698fc2e949 dislike 2020-06-27 09:46:48
25549 25549 75d6b589-7fae-4a6d-b0d0-752845150e56 2bd9c167-e06c-47c1-a978-3403d6724606 intrigued 2021-02-16 17:17:02
25550 25550 75d6b589-7fae-4a6d-b0d0-752845150e56 NaN interested 2020-09-12 03:54:58
25551 25551 75d6b589-7fae-4a6d-b0d0-752845150e56 5ffd8b51-164e-47e2-885e-8b8c46eb63ed worried 2020-11-04 20:08:31
25552 25552 75d6b589-7fae-4a6d-b0d0-752845150e56 4edc3d1a-a7d9-4db6-89c3-f784d9954172 cherish 2021-01-04 04:55:11

25553 rows × 5 columns

In [15]:
#we are going to check for null values in the table: 

df2.isnull().sum()
Out[15]:
Unnamed: 0       0
Content ID       0
User ID       3019
Type           980
Datetime         0
dtype: int64
In [16]:
#we can drop the null values using .dropna() command:

df2.dropna(inplace = True)
In [17]:
df2.isnull().sum()
Out[17]:
Unnamed: 0    0
Content ID    0
User ID       0
Type          0
Datetime      0
dtype: int64
In [18]:
df2
Out[18]:
Unnamed: 0 Content ID User ID Type Datetime
1 1 97522e57-d9ab-4bd6-97bf-c24d952602d2 5d454588-283d-459d-915d-c48a2cb4c27f disgust 2020-11-07 09:43:50
2 2 97522e57-d9ab-4bd6-97bf-c24d952602d2 92b87fa5-f271-43e0-af66-84fac21052e6 dislike 2021-06-17 12:22:51
3 3 97522e57-d9ab-4bd6-97bf-c24d952602d2 163daa38-8b77-48c9-9af6-37a6c1447ac2 scared 2021-04-18 05:13:58
4 4 97522e57-d9ab-4bd6-97bf-c24d952602d2 34e8add9-0206-47fd-a501-037b994650a2 disgust 2021-01-06 19:13:01
5 5 97522e57-d9ab-4bd6-97bf-c24d952602d2 9b6d35f9-5e15-4cd0-a8d7-b1f3340e02c4 interested 2020-08-23 12:25:58
... ... ... ... ... ...
25547 25547 75d6b589-7fae-4a6d-b0d0-752845150e56 b6d04982-1509-41ab-a700-b390d6cb4d02 worried 2020-10-31 04:50:14
25548 25548 75d6b589-7fae-4a6d-b0d0-752845150e56 80c9ce48-46f9-4f5e-b3ca-3b698fc2e949 dislike 2020-06-27 09:46:48
25549 25549 75d6b589-7fae-4a6d-b0d0-752845150e56 2bd9c167-e06c-47c1-a978-3403d6724606 intrigued 2021-02-16 17:17:02
25551 25551 75d6b589-7fae-4a6d-b0d0-752845150e56 5ffd8b51-164e-47e2-885e-8b8c46eb63ed worried 2020-11-04 20:08:31
25552 25552 75d6b589-7fae-4a6d-b0d0-752845150e56 4edc3d1a-a7d9-4db6-89c3-f784d9954172 cherish 2021-01-04 04:55:11

22534 rows × 5 columns

After dropping the null values, we are left with 22534 rows instead of the initial 25553¶

In [20]:
df2d = df2.drop(columns = ['User ID'])
df2d
Out[20]:
Unnamed: 0 Content ID Type Datetime
1 1 97522e57-d9ab-4bd6-97bf-c24d952602d2 disgust 2020-11-07 09:43:50
2 2 97522e57-d9ab-4bd6-97bf-c24d952602d2 dislike 2021-06-17 12:22:51
3 3 97522e57-d9ab-4bd6-97bf-c24d952602d2 scared 2021-04-18 05:13:58
4 4 97522e57-d9ab-4bd6-97bf-c24d952602d2 disgust 2021-01-06 19:13:01
5 5 97522e57-d9ab-4bd6-97bf-c24d952602d2 interested 2020-08-23 12:25:58
... ... ... ... ...
25547 25547 75d6b589-7fae-4a6d-b0d0-752845150e56 worried 2020-10-31 04:50:14
25548 25548 75d6b589-7fae-4a6d-b0d0-752845150e56 dislike 2020-06-27 09:46:48
25549 25549 75d6b589-7fae-4a6d-b0d0-752845150e56 intrigued 2021-02-16 17:17:02
25551 25551 75d6b589-7fae-4a6d-b0d0-752845150e56 worried 2020-11-04 20:08:31
25552 25552 75d6b589-7fae-4a6d-b0d0-752845150e56 cherish 2021-01-04 04:55:11

22534 rows × 4 columns

In [21]:
#lets rename the columns:

df2d.rename(columns = {'Type':'ReactionType'}, inplace = True)
In [22]:
df2d.head(5)
Out[22]:
Unnamed: 0 Content ID ReactionType Datetime
1 1 97522e57-d9ab-4bd6-97bf-c24d952602d2 disgust 2020-11-07 09:43:50
2 2 97522e57-d9ab-4bd6-97bf-c24d952602d2 dislike 2021-06-17 12:22:51
3 3 97522e57-d9ab-4bd6-97bf-c24d952602d2 scared 2021-04-18 05:13:58
4 4 97522e57-d9ab-4bd6-97bf-c24d952602d2 disgust 2021-01-06 19:13:01
5 5 97522e57-d9ab-4bd6-97bf-c24d952602d2 interested 2020-08-23 12:25:58
In [ ]:
 

Reaction type table¶

In [23]:
#still following the same process we used in the other tables

df3 = pd.read_csv('ReactionTypes.csv')
df3
Out[23]:
Unnamed: 0 Type Sentiment Score
0 0 heart positive 60
1 1 want positive 70
2 2 disgust negative 0
3 3 hate negative 5
4 4 interested positive 30
5 5 indifferent neutral 20
6 6 love positive 65
7 7 super love positive 75
8 8 cherish positive 70
9 9 adore positive 72
10 10 like positive 50
11 11 dislike negative 10
12 12 intrigued positive 45
13 13 peeking neutral 35
14 14 scared negative 15
15 15 worried negative 12
In [24]:
df3.rename(columns = {'Type':'ReactionType'}, inplace = True)
In [25]:
df3.head(5)
Out[25]:
Unnamed: 0 ReactionType Sentiment Score
0 0 heart positive 60
1 1 want positive 70
2 2 disgust negative 0
3 3 hate negative 5
4 4 interested positive 30
In [ ]:
 

Merging¶

After cleaning our data tables, the next action is to merge all the tables¶

In [26]:
#Lets take a look at the tables are are working with again:

df1d.head(5)#content
Out[26]:
Unnamed: 0 Content ID ContentType Category
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying
1 1 9f737e0a-3cdd-4d29-9d24-753f4e3be810 photo healthy eating
2 2 230c4e4d-70c3-461d-b42c-ec09396efb3f photo healthy eating
3 3 356fff80-da4d-4785-9f43-bc1261031dc6 photo technology
4 4 01ab84dd-6364-4236-abbb-3f237db77180 video food
In [27]:
df2d.head(5)#reaction
Out[27]:
Unnamed: 0 Content ID ReactionType Datetime
1 1 97522e57-d9ab-4bd6-97bf-c24d952602d2 disgust 2020-11-07 09:43:50
2 2 97522e57-d9ab-4bd6-97bf-c24d952602d2 dislike 2021-06-17 12:22:51
3 3 97522e57-d9ab-4bd6-97bf-c24d952602d2 scared 2021-04-18 05:13:58
4 4 97522e57-d9ab-4bd6-97bf-c24d952602d2 disgust 2021-01-06 19:13:01
5 5 97522e57-d9ab-4bd6-97bf-c24d952602d2 interested 2020-08-23 12:25:58
In [28]:
df3.head(5)#reactiontype
Out[28]:
Unnamed: 0 ReactionType Sentiment Score
0 0 heart positive 60
1 1 want positive 70
2 2 disgust negative 0
3 3 hate negative 5
4 4 interested positive 30
In [ ]:
 

we are going to merge the content and reaction tables¶

In [29]:
first_merge = pd.merge(df1d, df2d, on = 'Content ID')
In [30]:
first_merge.head(5)
Out[30]:
Unnamed: 0_x Content ID ContentType Category Unnamed: 0_y ReactionType Datetime
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 1 disgust 2020-11-07 09:43:50
1 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 2 dislike 2021-06-17 12:22:51
2 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 3 scared 2021-04-18 05:13:58
3 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 4 disgust 2021-01-06 19:13:01
4 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 5 interested 2020-08-23 12:25:58
In [31]:
#we are going to deal with the 'Unnamed' columns later in this project. But before then, we are going to combine the first merge with the reactiontype table
In [32]:
second_merge = pd.merge(first_merge, df3, on = 'ReactionType')
In [33]:
second_merge
Out[33]:
Unnamed: 0_x Content ID ContentType Category Unnamed: 0_y ReactionType Datetime Unnamed: 0 Sentiment Score
0 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 1 disgust 2020-11-07 09:43:50 2 negative 0
1 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 4 disgust 2021-01-06 19:13:01 2 negative 0
2 0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying 35 disgust 2021-04-09 02:46:20 2 negative 0
3 1 9f737e0a-3cdd-4d29-9d24-753f4e3be810 photo healthy eating 52 disgust 2021-03-28 21:15:26 2 negative 0
4 2 230c4e4d-70c3-461d-b42c-ec09396efb3f photo healthy eating 88 disgust 2020-08-04 05:40:33 2 negative 0
... ... ... ... ... ... ... ... ... ... ...
22529 997 435007a5-6261-4d8b-b0a4-55fdc189754b audio veganism 25489 adore 2020-10-04 22:26:33 9 positive 72
22530 997 435007a5-6261-4d8b-b0a4-55fdc189754b audio veganism 25491 adore 2020-09-18 10:50:50 9 positive 72
22531 998 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 GIF culture 25512 adore 2020-10-31 03:58:44 9 positive 72
22532 998 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 GIF culture 25524 adore 2020-06-25 15:12:29 9 positive 72
22533 998 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 GIF culture 25531 adore 2020-12-17 16:32:57 9 positive 72

22534 rows × 10 columns

In [34]:
#all the columns that are 'Unnamed' should be dropped using the .drop() command:

second_merge.drop(columns = ["Unnamed: 0_x","Unnamed: 0_y","Unnamed: 0"],axis = 1,inplace = True)
In [35]:
second_merge
Out[35]:
Content ID ContentType Category ReactionType Datetime Sentiment Score
0 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying disgust 2020-11-07 09:43:50 negative 0
1 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying disgust 2021-01-06 19:13:01 negative 0
2 97522e57-d9ab-4bd6-97bf-c24d952602d2 photo studying disgust 2021-04-09 02:46:20 negative 0
3 9f737e0a-3cdd-4d29-9d24-753f4e3be810 photo healthy eating disgust 2021-03-28 21:15:26 negative 0
4 230c4e4d-70c3-461d-b42c-ec09396efb3f photo healthy eating disgust 2020-08-04 05:40:33 negative 0
... ... ... ... ... ... ... ...
22529 435007a5-6261-4d8b-b0a4-55fdc189754b audio veganism adore 2020-10-04 22:26:33 positive 72
22530 435007a5-6261-4d8b-b0a4-55fdc189754b audio veganism adore 2020-09-18 10:50:50 positive 72
22531 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 GIF culture adore 2020-10-31 03:58:44 positive 72
22532 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 GIF culture adore 2020-06-25 15:12:29 positive 72
22533 4e4c9690-c013-4ee7-9e66-943d8cbd27b7 GIF culture adore 2020-12-17 16:32:57 positive 72

22534 rows × 7 columns

Insights that will help answer the client's questions:¶

how many unique categories do we have?¶

how many reactions are there to the category with the highest post?¶

which month has the most post?¶

In [36]:
#lets check the unique categories:

second_merge['Category'].unique()
Out[36]:
array(['studying', 'healthy eating', 'dogs', 'public speaking', 'science',
       'tennis', 'food', 'fitness', 'soccer', 'education', 'travel',
       'veganism', 'cooking', 'technology', 'animals', 'culture'],
      dtype=object)
In [37]:
second_merge['Category'].value_counts()
Out[37]:
animals            1738
science            1646
healthy eating     1572
technology         1557
food               1556
culture            1538
cooking            1525
travel             1510
soccer             1339
education          1311
fitness            1284
studying           1251
dogs               1227
tennis             1218
veganism           1146
public speaking    1116
Name: Category, dtype: int64
In [38]:
#now let’s visualize the categories based on the counts in the data set using bar chart:

sns.countplot(y = 'Category',data = second_merge, color = "darkgreen")
Out[38]:
<AxesSubplot:xlabel='count', ylabel='Category'>
In [ ]:
 
In [39]:
#we can also visualize the breakdown of the categories in percentage using Ploty:

#To import Ploty:
import plotly.express as px
In [40]:
fig1 = px.pie(second_merge, names = 'Category', values = 'Score')
fig1.show()

we have 16 unique categories!¶

In [41]:
#We'll take a look at the top 5 categories:

t5 = second_merge['Category'].value_counts().head(5)
t5
Out[41]:
animals           1738
science           1646
healthy eating    1572
technology        1557
food              1556
Name: Category, dtype: int64
In [42]:
#we are going convert the result to a dataframe:

t5a = t5.reset_index()
t5a
Out[42]:
index Category
0 animals 1738
1 science 1646
2 healthy eating 1572
3 technology 1557
4 food 1556
In [43]:
#We can rename the columns:

t5b = t5a.rename(columns = {'index':'Category','Category':'Score'})
In [44]:
t5b
Out[44]:
Category Score
0 animals 1738
1 science 1646
2 healthy eating 1572
3 technology 1557
4 food 1556
In [45]:
#the percentage of top 5 categories can also be visualized using Ploty: 

fig2 = px.pie(t5b, names = 'Category', values = 'Score')
fig2.show()
In [46]:
second_merge['Datetime'].value_counts()
Out[46]:
2021-01-07 14:49:14    2
2020-06-27 06:28:56    2
2020-12-13 17:37:25    2
2020-08-10 18:01:52    2
2020-09-11 05:52:04    2
                      ..
2020-11-16 09:44:42    1
2020-08-30 08:00:42    1
2021-03-15 00:15:46    1
2021-05-03 04:36:19    1
2020-12-17 16:32:57    1
Name: Datetime, Length: 22524, dtype: int64
In [47]:
months = pd.DatetimeIndex(second_merge['Datetime']).month.value_counts()
months
Out[47]:
5     1954
1     1949
8     1945
12    1941
10    1889
7     1884
11    1866
9     1862
3     1857
6     1836
4     1801
2     1750
Name: Datetime, dtype: int64
In [48]:
pd.DatetimeIndex(second_merge['Datetime']).month.value_counts().nlargest(1)
Out[48]:
5    1954
Name: Datetime, dtype: int64

the month of May has the highest number of posts¶

What's next?¶

The outcome from this project can help the company make informed decisions to improve their content offerings, prepare for a successful IPO, and establish a strong foundation for handling user data responsibly. The successful implementation of the insights from this project can lead to increased user engagement, investor interest, and regulatory compliance.